Measuring the Semantic Distance between Languages from a Statistical Analysis of Bilingual Dictionaries
نویسنده
چکیده
A bilingual dictionary is a valuable linguistic resource which records, among other things, the di erences in the segmentation of semantic space by the two languages and hence the di culty in producing faithful translations between the two languages. Statistical analysis of nearly a hundred dictionaries has allowed us to determine how best to measure the semantic distance between languages from bilingual dictionaries. The distribution of the number of words in language A having n translations in language B, for n=1,2,3, etc., was found to have a speci c shape depending on the semantic distance between the two languages. A sample of only a thousand words was su cient to obtain an estimate of semantic distance. We give a theoretical justi cation for this distance based on models of the historical evolution of monolingual and bilingual dictionaries. Among our linguistic ndings, we discovered, for example, that French is semantically closer to Basque than to German. We envisage an application of our semantic distance measure in the choice of an intermediate language when performing indirect translation, i.e. translating from language A to language B via a third language C.
منابع مشابه
On multiword lexical units and their role in maritime dictionaries
Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...
متن کاملGenerating Cross-lingual Concept Space from Parallel Corpora on the Web
The information available in languages other than English on the World Wide Web is increasing significantly. To cross language boundaries between different languages, dictionaries are the most typical tools. However, the general-purpose dictionary is less sensitive in genre and domain and it is impractical to manually construct tailored bilingual dictionaries or sophisticated multilingual thesa...
متن کاملPivot-Based Bilingual Dictionary Extraction from Multiple Dictionary Resources
High quality bilingual dictionaries are rarely available for lower-density language pairs, especially for those that are closely related. Using a third language as a pivot to link two other languages is a wellknown solution, and usually requires only two input bilingual dictionaries to automatically induce the new one. This approach, however, produces many incorrect translation pairs because th...
متن کاملThe apertium bilingual dictionaries on the web of data
Bilingual electronic dictionaries contain collections of lexical entries in two languages, with explicitly declared translation relations between such entries. Nevertheless, they are typically developed in isolation, in their own formats and accessible through proprietary APIs. In this paper we propose the use of Semantic Web techniques to make translations available on the Web to be consumed b...
متن کاملMapping Words Between Slovak Text and its Translation to English
Word alignment in texts translated to different languages is used in various applications such as cross-language information retrieval. To search for equivalent words in text translations various statistical methods, methods based on position of words in phrases and methods based on bilingual dictionaries are used. However it is very difficult to use these methods in languages with big morpholo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Quantitative Linguistics
دوره 15 شماره
صفحات -
تاریخ انتشار 2008